Goto

Collaborating Authors

 synthesize program


4f2accafe6fa355624f3ee42207cc7b8-Supplemental-Conference.pdf

Neural Information Processing Systems

A.1 DomainSpecificLanguage(DSL)Specifications Table 5 shows the domain-specific language (DSL) designed for E-MAPP in theOvercooked-v2 environment. Each convolutional layer has a kernel size of3except for the first one, which has a kernel sizeof5. The inventory statesinv is encoded by a three-layer MLP with hidden size 128 for all layers. The output goal featurefgoal is a640-dim feature vector.fgoal Name Value learningrate 3e-4 updatebatchsize 128 In cooperative settings, the goal input of the assistive agent is the leading agent's goal.


Learning to Synthesize Programs as Interpretable and Generalizable Policies

Neural Information Processing Systems

Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g.




Learning to Synthesize Programs as Interpretable and Generalizable Policies

Neural Information Processing Systems

Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. We present a framework that instead learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to first learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task.


The Premature Obituary of Programming

Communications of the ACM

Deep learning (DL) has arrived, not only for natural language, speech, and image processing but also for coding, which I refer to as deep programming (DP). DP is used to detect similar programs, find relevant code, translate programs from one language to another, discover software defects, and to synthesize programs from a natural language description. The advent of large transformer language models10 is now being applied to programs with encouraging results. Just like DL is enabled by the enormous amount of textual and image data available on the Internet, DP is enabled by the vast amount of code available in open source repositories such as GitHub, as well as the ability to reuse libraries via modern package managers such as npm and pip. The former is used in the Github Copilot project14 and integrates with development environments to automatically suggest code to developers.